When we want to make content publicly available, there is quite possibly a flow dependency

OPHI Document title: ophi/content use cases
Version: @@@
Date: 11/5/2003
Status: Draft
Document: openphi.mit.edu/.... for
Please send any comments on the document to gherman@mit.edu
Author of this version: Licensing group
For more information check the Open Process Handbook Initiative at: http://ccs.mit.edu/ophi/

Methods for publicizing content

When we want to make content publicly available, there is quite possibly a flow dependency of getting information from the creator to the reader. I use our typical right thing/right place/right time to tease out the options.

Flow dependency of desired content being publicly accessible:

Right Thing: Desired content

Standard entries:

What standard?

Is there a minimum acceptable entry? e.g. must there be descriptive text.

Custom entries:

Is there any translation to an acceptable standard done, and if so by whom and when? e.g. if a German user creates an entry in German is that okay or does this need to be translated by the author or an editor?

Right Place: Ways in which the data could be accessed

Move the entire database
Move a portion of the database (PIF is one method for this)
Move a set of changes to a database (scripting)
Allow outside access to the data

Right Time: Timing for the access

Push entries at variable intervals/amounts (Author pushes to a central site)
Push entries at set intervals/amounts (Author pushes to a central site)
Pull entries at variable intervals (Central site pulls from authors)
Pull entries at set intervals (Central site pulls from authors)
Continual access (transient- webserver environment)

Variants

Central database, post hoc editing (e.g. Wikipedia)

This provides easy expansion of a central database, but the quality of entries will be wide-ranging. The issues of porn, commercial advertising, sabotaging competitors, etc. are huge. This is the highest risk option for whoever is accountable for the central site, especially until new feature are added to the software per Tim’s comments ("A word of warning, the Wikipedia model works because of the way that the software is designed. The trace of the changes encourage people to make useful contributions. The ease with which unwanted changes can be backed out makes it easy to recover from malicious users. The handbook as it stands has neither of these features.") This variant might be used as an ‘unbranded’ testbed for generating lots of entries. Entries added directly by authors. Note that this means they need not have downloaded any software or content and may be beyond the scope of a licensing agreement.

Central database, pre hoc editing (e.g. DMOZ or current EPH)

This provides much better quality control, but imposes a choke point of needing to be edited. Case in point - OPD has a 6 month waiting period before editorial review, even with 57,000+ editors. The following link provided by Jintae (http://www.wikipedia.org/wiki/Open_Directory_Project/Temp#Editing_model) details a lot of concerns. The biggest issue here is managing the editorial process. This variant is probably best used for the ‘branded’ database. Entries are submitted by authors sending PIF’s or scripts to the editors who review and enter them.

Decentralized webserver access (e.g. provide a URL)

This is the easiest option by far in a webserver software environment. Anyone who downloads the webserver software to create content provides a URL to their webserver. This provides immediate access to the widest range of entries without the downside of accountability in a central database for inappropriate entries. If they want their entries in the ‘branded’ database they need to send them via PIF or script. A link to their site from the appropriate entries in the ‘branded’ db would also provide visibility. The two issues here are if the availability of the webserver goes away (e.g. student graduates) or if we want to include the entries in a branded db. Perhaps users both provide URLs and also send or provide ability to download PIFs/scripts/dbs at set intervals (quarterly or 6 months would be my intuition).

Decentralized editorial environment (e.g. TNG or custom software environment)

This would be a case where interested parties get PIFs/scripts directly from user sites. A reason for this may be that the user has modified the software such that entries (or at least the portion that makes them interesting) are not readable by other than the custom software. The biggest issue here is ensuring continuity of access as users move on to other jobs. Again, maybe the user sends a copy of the environment at set intervals in addition to providing files directly to those interested. This variant may be best for computational scenarios.

Software environments

I use the following acronyms for software used. I’m assuming current functionality in these cases. Obviously enhanced software could allow other options.

TNG – MIT’s editor that can PIF

RPH – MIT’s research webserver

EPH – 3 year old Phios webserver with editor that can PIF

PH – current Phios webserver

PH.NET – forthcoming .net version (Q1 ’04?)

PH.NET2 – second wave .net version (Q2 ’04?)

Variants available under various environments

	Use as Central post hoc	Use as Central pre hoc	Decentral. web	Decentral. Editor
TNG / RPH	Not viable	Load PIFs; must run RPH	Must run RPH	Provide PIF
EPH	Requires changes per Tim	Load PIFs	Post URL	Provide PIF
PH	Requires changes per Tim	Need new features to accept PIFs	Post URL	Need new feature to provide PIF
PH.NET	No webserver	No webserver	No webserver	Provide XPL
PH.NET2	Maybe has changes	Load XPL, PIF, etc.	Post URL	Provide XPL

Human readable scenarios

(These are the first of a variety of scenarios- others will include the various Phios customer scenarios, IWPC, and computational scenarios)

Teaching: taxonomy oriented (TNG)

Professor H uses TNG to teach students various options in construction at a Graduate School of Design (GSD). He uses a database shell. Students enter their taxonomies in their own copy of TNG during the course. If there are ten students a semester each entering 40 entries of various construction methods, their steps and variants for the steps, there are 400 entries each semester. Of these there will be a high degree of redundancy. The entries will be ‘skeletal’ in that they will have a name but may have minimal descriptive text. Also, there may be hundreds of entries that were deleted by the students before submission. Professor H may distill a tree of 80 entries that reflects the best of the students' work that he keeps in his TNG. This may evolve over time. He may enter more descriptive text for these entries. No-one is interested in the deleted entries. Professor H may be interested in keeping all of the students' work, either in his own TNG or as PIF files. The public is probably only interested the distilled model.

Issues regarding what content is shared:

Is there a difference in whether student work should be public if it was presented to the class or not (>2 people issue)?

Technique for sharing

Since the students work on their own TNG’s, PIF is a good possibility if their work is desired.

Professor H’s work could also be a PIF or also running the RPH against his mdb.

One easy answer is "Send a PIF containing the distilled model after each semester". A Central database would PIF this in under a "Construct – views" bundle as a "Construct {GSD} entry. The parts of the Construction process might also be classified under other views bundles (preferable) or left as Unclassified. The quality of the entries might influence the desirability of integrating the parts into the general taxonomy.

The Central Database could be either the RPH or EPH if PIF is used. I’m unsure how a PIF would get into the PH. Perhaps if the entire database is sent a script could be developed from it.

The above scenario could also be achieved (with difficulty) by using the EPH or PH software. Instead of each student working on their own TNG, they would each enter their work over the web into the one webserver database. Keeping each student from seeing/interfering with other students’ work is problematic, as is keeping the distilled model secret until after the student entry.

Teaching: case oriented (EPH or PH)

Professor M uses a webserver to show the knowledge management structure for business cases. Each of 35 students prepares a case using Word. 5 are junk. All 35 are added to the structure by a TA and shared in class. No one is interested in the 5 junk cases. Sloan is interested in the 30. The public might be interested in the 30.

Issues regarding what content is shared:

Is there a difference if the student work is shared using Word and not entered into the PH/EPH?

Technique for sharing:

The easiest is to publicize the URL. As soon as work is entered it is available.

Research: taxonomy oriented (EPH)

Professor S is funded to develop an enhancement to the Business Activity Model incorporating an expanded view of Receiving (or Information Work). He uses the EPH and remotely enters his additions to the database. He specializes a few high level entries but dramatically increases the level of detail of their decomposition. He adds 100’s of images linked from the text. He leaves new ‘parts’ in the unclassified area. He uses the EPH to share this information with his sponsor.

Issues regarding what content is shared

Are the images linked from the entries part of the content that must be shared? What if they’re movie files that are very large? Or require unusual commercial software to display which the central db would have to purchase?

Technique for sharing

If the sponsor wants the information to be private, a fee must be paid. If not, the easiest technique again is to give a URL for the site or a page with links to the particular entries. An alternative is to send a PIF to a central site of the entries and also FTP or otherwise transfer the images. Ensuring that the image links still work could require editing.

Research: taxonomy oriented (TNG)

Professor W has a much different view of the basic taxonomy and moves things around such that there is only Create and Modify at the top level. While he creates some new entries, the bulk of his work recategorizes existing entries.

Issues regarding what content is shared

Are movements of entries covered?

Techniques for sharing

These changes would alter the base taxonomy if included by either PIF or script. These should probably be maintained under a separate browser.

Research: case oriented (custom software)

Professor B’s class uses customized PH software (cPH) to analyze exception handling in a dynamic environment. The cPH is linked to an ERP-like system and email. His class enters handling techniques in the cPH and runs scenarios to demonstrate how different situations could be handled. These cases can then be reviewed using the cPH and students comment on each other’s work using a discussion feature.

Issues regarding what content is shared

Are emails generated from the cPH part of the content? Is the discussion forum part of the content?

Techniques for sharing

If cPH is required to view the entries, should a central group run the environment? Or, again, is there just a link to Professor B’s site?

Business uses (Phios customers – all PH)

Chemical company

Chemical Company used the process repository to publish process knowledge within their corporate intranet. Although they never used it, Phios built a custom Visio import tool.

They were to use this tool to import existing processes that were documented using Visio. They made extensive use of linking. This included links to PDF files and link within their Documentum system. They did not use the repository taxonomy.

This type of customer would likely pay for exclusivity. If not, the sharing model is likely either PIF or a linked URL.

Industry Group

The Industry Group uses the repository to publish their models. Unlike all of our other customers the models are published over the internet although access is restricted to members only. Most of the model is connected to the repository taxonomy.

This type of customer might or might not pay for exclusivity. If not, the way to publish is over their own webserver. We could just link the URL.

Bank

Like the Chemical Company, Bank uses the repository to publish process knowledge within their corporate intranet. Unlike Chemical Company, Bank has many editors which create and update content using the web-based editor and the Phios Toolbar. They did not use the repository taxonomy.

Again, exclusivity is likely. If not, then using the webserver URL would be useful.

Manufacturer

Manufacturer’s use model was similar to Chemical Company and Bank. New features such as

Measurements and Heat Maps were added to the product for decision support. They did not use the repository taxonomy.

Again, PIF or link the URL.

Telecom

Although Telecom has bought a license to the Repository they have never implemented an application. In theory they were going to use it in the typical way - to publish process knowledge within their corporate intranet.

Hospital Software

Phios has been working with Hospital to produce a custom Assessment Generator application built on the process repository. Hospital Software sells software solutions and consulting to the health care industry. Part of the selling process is to perform an assessment (questionnaire) to

capture the current process used by each department within a hospital facility. A standardized process model developed by Hospital Software drives this assessment. The Generator application uses this model to build an email an Excel spreadsheet to the consultant performing the assessment. The data is then uploaded back into a specialization of the standard model. Reports are then produced (we are just starting this work now) and analysis is performed to determine how best the Hospital Software suite can be applied to the hospital. They do not use the repository taxonomy.

If they wished to share this, it could be a URL anchored somewhere within the taxonomy or PIFing in the entire database.

Computational: web services (no software)

Benjamin described a situation where someone doesn’t use the software at all, but exports the content and edits/adds to it remotely and then links it to separate software. This modified content may not be directly usable by OPHI software. If we get files, it would just be for safekeeping. Interested public would contact the user directly for downloads of the files.

Suggestion for the OPHI environment:

I would suggest we use all four techniques above get used over time.

The Wikipedia approach is the most problematic and also unlinked from the OPHI issue, so I would suggest we promote/wait for software that supports this. As this is not a very commercial option (one set of software required) it may be a while before the funding/energy is available.

The branded database would likely continue with a definite ‘standard entry’ requirement. We would need to establish author guidelines and an editorial/control policy for this. My assumption is that some users (my guess is mainly academics or commercial entities who want to sell their service/software) will want their entries put into the branded database. These users will be more likely to comply with authoring standards.

Decentralized webservers are the fastest way to grow and expand new entries. It’s also by far the simplest for any central group. Associated with the branded database should be a webpage that lists all of the publicly available sets of content with a brief description of the content (probably written by the creator), URL’s of their webserver and a contact email. For ‘safekeeping’, a copy of the database, PIF or script should probably be sent at regular intervals in case the user site disappears. Note that these entries may not comply with any standards and be highly problematic to import into the branded database. My impression is that many non-academic/non-commercial users will share content to avoid the fee and not because they want to publish their entries.

Decentralized files are the least desirable, but may be the quickest way to disseminate non-OPHI software content. If this is useful enough to put into the branded database, then re-authoring might be required.