Story Analyzer App

What is the Story Analyzer (SA) Web App?

The Story Analyzer web application enables users to create and visualize dashboards that assist with understanding a story through the use of natural language processing (NLP) and data visualization. Through the use of Stanford's CoreNLP Java Library, this system extracts narrative information via a story analyses process, and generates dashboard of data visualizations for a particular story. The visualizations are created through D3 visualizations and Google maps.

In sum, Story Analyzer helps to visually and interactively answer these questions: Who are the main characters of the story (both people and groups), and how do they interact with each other? Where and when do these interactions happen? What are some major themes and contexts of the story?

More about the research and background behind Story Analyzer can be found at this link: http://storyanalyzer.org/.

The author of Story Analyzer is Mike Mitri, a professor emeritus of Computer Information Systems at James Madison University.

How do I add a story to my library?

In the navigation bar, click on the tab labeled New Story. There you can input or upload the details of the story. This includes a title for the story, the source (typically a URL of the story on the web) and the full text of the story. It is often useful to edit this text before submitting. For example, if this is a story from a news feed on the web, it may include extraneous links and advertisements. If these are irrelevant for you, you can just delete them from the text and just keep the narrative itself. When you submit the new story, you will be taken to the Requst Analysis page. Here, you can select this or another of your stories to be analyzed by the Story Analyzer engine. To see a full list of your created stories, head to the navigation bar and click on Library.

How can I analyze and extract information from a story?

In the navigation bar, click on the tab labeled Requst Analysis. There, you can pick an saved story to begin the analysis process. The analysis process will take some time to complete. This depends on the volume of text as well as the current demand on the system. For a 5000-word story, the typical procssing time is about 30 minutes. If other user requests are being processed, yours will be queued.

Note that some stories, such as news stories, have a publish date that is relevant for interpreting the time frame of the story. For these cases, you should include the narrative date. Enter this in the format: YYYY-MM-DD.

When the analysis is complete, you can visualize the results in a dashboad by going to the Visualize page. If necessary, you can edit, correct, and refine the results using the Edit page.

Any story in your Library can be edited as you desire. You can request another analysis on the story as many times as you wish.

What happens during SA analysis?

Story Analyzer uses the CoreNLP software library, created by Stanford University’s NLP Group. CoreNLP provides many NLP services for processing and analyzing text documents, including:

• Breaking the text into individual sentences (sentence splitting)
• Tokenizing a sentence (breaking it into individual words)
• Identifying parts of speech (POS) within a sentence (nouns, verbs, adjectives, adverbs, etc.).
• Named entity recognition (NER) – recognizing names of people, places, organizations, dates/times, etc.
• Lemmas – the root form of each word (token)
• Constituency parsing – constructing taxonomies of noun phrases and verb phrases of a sentence.
• Dependency parsing – constructing the graph of linguistic relationships between pairs of terms in a sentence. Examples include adjectival modifiers, compound nouns, prepositions, and subject/ action/object relationships (De Marneffe and Manning, 2008).
• Co-reference resolution – finding all expressions that refer to the same entity in a text. Useful for matching pronouns to people or finding common themes in the text.
• Temporal tagging – recognizing and normalizing temporal expressions (e.g., “next Wednesday”, “the previous summer”, “July of next year”, “Labor Day of 2023”, “yesterday”, etc.)

CoreNLP's text processing in computationally intensive, and a story of 5000 words will typically take 30 minutes to process. After Story Analyzer receives the results from CoreNLP, it does something called information extraction. Specifically, SA identifies the characters of a story, their interactions, the times and places when the interactions occured, the "themes" of the story (identified by frequently occuring terms/phrases), and other pertinent information of the narrative.

From the data created during this processing, SA produces interactive visualizations, which can be combined into dashboards, to provide users with a capability to "read" complex text via images. "A picture tells a thousand words".

What does it mean to visualize a story?

Any analyzed story can be visualized via the Visualize page.This page gives you a wide variety of visualizations for a dashboard that you can customize for your needs. The visualizations include the following:

Word clouds depicting people, groups, and other entities of the story
A interactions chord and a narrative web depicting the interactions and relationships between actors in the story
A calendar depicting when actions and events occured in the story
An map depicting where actions and events occured in the story
The highlighted text, which appears when you select an item from one of the other visualizations

If necessary, you can edit, correct, and refine the results using the Edit page.

Any story in your Library can be edited as you desire. You can always request another analysis on the story as many times as you wish.

How do I interact with the visualizations?

Interacting with a visualization is very simple. You can hover your mouse over an element (e.g. a word/phrase in a cloud, a location in a map, a date on a calendar, etc.). If you do this, the related elements in other visualizations will be highlighted, and the relevant sentences in the story will be shown.

If you click your mouse over an element, this freezes the screen, so new hovering doesn't change things. When in "freeze" mode, you can see associations between elements. For example, if you click on a word in a cloud, and enter "freeze" mode, then hovering over other related words in this or other clouds will show the sentences in which both words appear. So, if you want to easily see sentences with a pair of people in them, you can do this by clicking/freezing and then hovering over another name. In order to "unfreeze", simply click on an element again.

These videos show interactivity in action. One is with word clouds for people and groups, along with the interactions visualization. The second shows what happens with maps and calendars.

Why would I want to edit analysis results, and how do I do it?

When you Visualize an analyzed story via a dashboard, you may see errors or things that need correcting. Natural Language Processing (NLP) and Artificial Intelligence are amazing technologies, but sometimes they get things wrong. For example, you may find a mistake regarding people or groups, places or times, or the associations between the people and groups in the story. When you see corrections that need to be made, you can do this via the Edit page.

From this page, you can retrieve and modify much of the data received from the NLP processing. These editing features include the following:

Details about each sentence in the story, including its tokens (words), parts of speech (POS) for each token, named entity recognition (NER) types, etc.
Details about each individual token of the story, through which you can also make modification to the above details.
Details about the coreferences in the story, enabling you to connect ideas from separate sentences.
Details about the times and locations of the story

For each of these options, you can make editing changes which are then reflected in the Visualize page.

What are tokens, and how can I edit and correct their details?

When CoreNLP processes a text document, it splits the text into sentences, and splits each sentence into tokens. A token is typically a word in the sentence. If the word is a contraction (such as "isn't"), this is actually two tokens, one for "is", and the other for "n't", which is interpreted as "not". Also, some punctuations, such as parentheses or commas, are also tokens returned for a sentence from CoreNLP. Each token has the following attributes:

The token value itself.
The token's part of speech (POS). It could be a verb or noun or adjective or adverb, or other possible parts of speech. CoreNLP uses the Penn Treebank tag set to identify each token's POS.
The token's named entity type, identified via CoreNLP's Named Entity Recognition (NER) algorithm.
The token's lemma. A lemma is like the root word for a token. For example a verb like "running" would have "run" as a lemma.

Story Analyzer uses NER results to classify story entities as either people, groups, locations or dates. The NER classification for people is PERSON. Groups include NERs such as ORGANIZATION, NATIONALITY, RELIGION, or IDEOLOGY. Location NERs include LOCATION, CITY, STATE_OR_PROVINCE, and COUNTRY. Date NERS include DATE and TIME.

The techniques that CoreNLP uses for determining token properties like NER or POS rely largely on machine learning. Sometimes the machine learning algorithms produce erroneous results. For example, an word referring to an organization may mistakenly be interpreted as a person or vice versa. So, Story Analyzer allows you to edit tokens and make corrections when needed.

There are two ways to edit token properties. One is to find all tokens of a particular value throughout the whole text and modify them, as shown here:

Alternatively, you can focus on a particular sentence, as shown here:

For each of these options, you can make editing changes which are then reflected in the Visualize page For example, you had seen a person's name in the groups word cloud or vice versa, then making the corrections will put it in the proper cloud.

What are coreferences, and how do I edit them?

Coreference resolution is a natural language processing technique to tie elements and ideas of a story together. A very common example is to link full and/or partial names that come up several times throughout the text and to show that they refer to the same person, group, place, or time. Coreference resolution also attempts to associate a pronoun (he, she, they) to a particular person or group. Other coreferences associate similar phrases or ideas with each other. Stanford's CoreNLP has a coreference resolution algorithm that produces data of the following structure.

Each coreference item is called a mention. Mentions are grouped together in coreference chains. The first mention of a chain is called its coreference head. Each mention of a coreference chain has the following attributes:

The Mention ID. An identifier for the mention.
The Chain ID. This is the chain that the mention belongs to.
The Sentence Number. Each sentence may have several mentions embedded in it.
The Start and End Token numbers. A mention is typically a phrase consisting of multiple sequential tokens.
The Mention Type. Can be PROPER (a proper noun, a name of some sort), PRONOMIAL (a pronoun), LIST (refering to more than one entity), or NOMINAL (something else).
The Animacy. Can be ANIMATE (a living being), or INANIMATE (not a living being).
The Gender. Can be MALE, FEMALE, or UNKNOWN.

Coreference resolution is an amazing thing, and illustrates the power of artificial intelligence. CoreNLP had three versions of coreference resolution, the most accurate of which is implemented using machine learning via a neural network. But it's not perfect. Machine learning models have varying degrees of accuracy. It they work with structured data like what you see in databases and spreadsheets they can be very accurate. But for something as complex and intuitive as understanding a story, they often get things wrong. CoreNLP's neural network coreference resolution algorithm has an accuracy level of only 60+ percent. That means that almost 40% of the time it gives the wrong results. We should all be a little skeptical about the answers AI gives us, especially when it comes to understanding a story.

So, coreference resolution sometimes gives us incorrect content. But, despite its limitations, CoreNLP's coreference resolution gives a very useful data structure that helps Story Analyzer stitch together the flow of a story. SA's editing features allow the user to make corrections to the CoreNLP results and thereby produce more accurate dashboards. The coreference editing enabled by SA includes viewing and editing the following:

Coreference details for a token value (e.g. a particular word, like a person's name for instance. This allows you to edit details of every mention including that token value.
Coreference details for a sentence number. All mentions of a particular sentence are shown, and you can change what you need.
Coreference details for a chain ID. All mentions of a coreference chain are shown, and you can change what you need.
Coreference details for a pronouns. All mentions that are PRONOMINAL are shown, and you can make corrections as needed.