The Story Analyzer web application enables users to create and visualize dashboards that assist with understanding a story through the use of natural language processing (NLP) and data visualization. Through the use of Stanford's CoreNLP Java Library, this system extracts narrative information via a story analyses process, and generates dashboard of data visualizations for a particular story. The visualizations are created through D3 visualizations and Google maps.
In sum, Story Analyzer helps to visually and interactively answer these questions: Who are the main characters of the story (both people and groups), and how do they interact with each other? Where and when do these interactions happen? What are some major themes and contexts of the story?
More about the research and background behind Story Analyzer can be found at this link: http://storyanalyzer.org/.
The author of Story Analyzer is Mike Mitri, a professor emeritus of Computer Information Systems at James Madison University.
In the navigation bar, click on the tab labeled New Story. There you can input or upload the details of the story. This includes a title for the story, the source (typically a URL of the story on the web) and the full text of the story. It is often useful to edit this text before submitting. For example, if this is a story from a news feed on the web, it may include extraneous links and advertisements. If these are irrelevant for you, you can just delete them from the text and just keep the narrative itself. When you submit the new story, you will be taken to the Requst Analysis page. Here, you can select this or another of your stories to be analyzed by the Story Analyzer engine. To see a full list of your created stories, head to the navigation bar and click on Library.
In the navigation bar, click on the tab labeled Requst Analysis. There, you can pick an saved story to begin the analysis process. The analysis process will take some time to complete. This depends on the volume of text as well as the current demand on the system. For a 5000-word story, the typical procssing time is about 30 minutes. If other user requests are being processed, yours will be queued.
Note that some stories, such as news stories, have a publish date that is relevant for interpreting the time frame of the story. For these cases, you should include the narrative date. Enter this in the format: YYYY-MM-DD.
When the analysis is complete, you can visualize the results in a dashboad by going to the Visualize page. If necessary, you can edit, correct, and refine the results using the Edit page.
Any story in your Library can be edited as you desire. You can request another analysis on the story as many times as you wish.
Story Analyzer uses the CoreNLP software library, created by Stanford University’s NLP Group. CoreNLP provides many NLP services for processing and analyzing text documents, including:
• Breaking the text into individual sentences (sentence splitting)CoreNLP's text processing in computationally intensive, and a story of 5000 words will typically take 30 minutes to process. After Story Analyzer receives the results from CoreNLP, it does something called information extraction. Specifically, SA identifies the characters of a story, their interactions, the times and places when the interactions occured, the "themes" of the story (identified by frequently occuring terms/phrases), and other pertinent information of the narrative.
From the data created during this processing, SA produces interactive visualizations, which can be combined into dashboards, to provide users with a capability to "read" complex text via images. "A picture tells a thousand words".
Any analyzed story can be visualized via the Visualize page.This page gives you a wide variety of visualizations for a dashboard that you can customize for your needs. The visualizations include the following:
If necessary, you can edit, correct, and refine the results using the Edit page.
Any story in your Library can be edited as you desire. You can always request another analysis on the story as many times as you wish.
Interacting with a visualization is very simple. You can hover your mouse over an element (e.g. a word/phrase in a cloud, a location in a map, a date on a calendar, etc.). If you do this, the related elements in other visualizations will be highlighted, and the relevant sentences in the story will be shown.
If you click your mouse over an element, this freezes the screen, so new hovering doesn't change things. When in "freeze" mode, you can see associations between elements. For example, if you click on a word in a cloud, and enter "freeze" mode, then hovering over other related words in this or other clouds will show the sentences in which both words appear. So, if you want to easily see sentences with a pair of people in them, you can do this by clicking/freezing and then hovering over another name. In order to "unfreeze", simply click on an element again.
These videos show interactivity in action. One is with word clouds for people and groups, along with the interactions visualization. The second shows what happens with maps and calendars.
When you Visualize an analyzed story via a dashboard, you may see errors or things that need correcting. Natural Language Processing (NLP) and Artificial Intelligence are amazing technologies, but sometimes they get things wrong. For example, you may find a mistake regarding people or groups, places or times, or the associations between the people and groups in the story. When you see corrections that need to be made, you can do this via the Edit page.
From this page, you can retrieve and modify much of the data received from the NLP processing. These editing features include the following:
For each of these options, you can make editing changes which are then reflected in the Visualize page.
When CoreNLP processes a text document, it splits the text into sentences, and splits each sentence into tokens. A token is typically a word in the sentence. If the word is a contraction (such as "isn't"), this is actually two tokens, one for "is", and the other for "n't", which is interpreted as "not". Also, some punctuations, such as parentheses or commas, are also tokens returned for a sentence from CoreNLP. Each token has the following attributes:
Story Analyzer uses NER results to classify story entities as either people, groups, locations or dates. The NER classification for people is PERSON. Groups include NERs such as ORGANIZATION, NATIONALITY, RELIGION, or IDEOLOGY. Location NERs include LOCATION, CITY, STATE_OR_PROVINCE, and COUNTRY. Date NERS include DATE and TIME.
The techniques that CoreNLP uses for determining token properties like NER or POS rely largely on machine learning. Sometimes the machine learning algorithms produce erroneous results. For example, an word referring to an organization may mistakenly be interpreted as a person or vice versa. So, Story Analyzer allows you to edit tokens and make corrections when needed.
There are two ways to edit token properties. One is to find all tokens of a particular value throughout the whole text and modify them, as shown here:
Alternatively, you can focus on a particular sentence, as shown here:
For each of these options, you can make editing changes which are then reflected in the Visualize page For example, you had seen a person's name in the groups word cloud or vice versa, then making the corrections will put it in the proper cloud.
Coreference resolution is a natural language processing technique to tie elements and ideas of a story together. A very common example is to link full and/or partial names that come up several times throughout the text and to show that they refer to the same person, group, place, or time. Coreference resolution also attempts to associate a pronoun (he, she, they) to a particular person or group. Other coreferences associate similar phrases or ideas with each other. Stanford's CoreNLP has a coreference resolution algorithm that produces data of the following structure.
Each coreference item is called a mention. Mentions are grouped together in coreference chains. The first mention of a chain is called its coreference head. Each mention of a coreference chain has the following attributes:
Coreference resolution is an amazing thing, and illustrates the power of artificial intelligence. CoreNLP had three versions of coreference resolution, the most accurate of which is implemented using machine learning via a neural network. But it's not perfect. Machine learning models have varying degrees of accuracy. It they work with structured data like what you see in databases and spreadsheets they can be very accurate. But for something as complex and intuitive as understanding a story, they often get things wrong. CoreNLP's neural network coreference resolution algorithm has an accuracy level of only 60+ percent. That means that almost 40% of the time it gives the wrong results. We should all be a little skeptical about the answers AI gives us, especially when it comes to understanding a story.
So, coreference resolution sometimes gives us incorrect content. But, despite its limitations, CoreNLP's coreference resolution gives a very useful data structure that helps Story Analyzer stitch together the flow of a story. SA's editing features allow the user to make corrections to the CoreNLP results and thereby produce more accurate dashboards. The coreference editing enabled by SA includes viewing and editing the following: