Mine Laboratory 高度ソフトウェア工学講座 知的情報環境研究室


(Japanese version)

All of them have in common the application of machine learning and deep learning to data (ranging in scale from small to large). The difference is that machine learning and deep learning are used to learn the semantic representation of the data, to study how to apply machine learning and deep learning, to extract the meaning and patterns of the data, and to make recommendations, depending on the content of the data.

Educational Data Mining

  1. Prediction of learners' learning status and learning ability, changes in performance and explanation of the reasons for such changes
    • Data used: learner's reflection, instructor's report, regular exam results, etc.
    • Target: Class grades, mock exam grades, and the level of student understanding given by the instructor.
    • Description: To find features (words and expressions) related to learning status and learning ability and their degree of importance in reflective essays and instructor-reported essays, and to estimate learning status and learning ability.
    • Comment: In addition to the research we have been conducting on university students' comments, we are conducting research on grade prediction, learning status estimation, grade change estimation, and generation of teacher's reports, using data from junior high school students' comments and teacher's reports, etc. in collaboration with cram schools.
  2. Automatic generation of feedback sentences for learners
    • Data to be used: learner's comments, teacher reports, and regular exam results.
    • Target: Advice sentences for learners, instructor's report sentences
    • Description: Automatic generation of advice sentences corresponding to student comment sentences ,and generation of teacher report sentences from a few keywords by using machine learning and deep learning.
  3. Automatic scoring of short-answer answer texts (joint research with the National Center for University Entrance Examinations)
    • Data used: Rubrics (scoring criteria), model answers, student answer texts
    • Target: Scores of learners' written answers
    • Description: Using machine learning and deep learning, we estimate the score of student's answer sentences based on the rubric (scoring criteria) and model answers without using the scoring results of student's answer sentences (teacher data) as much as possible.
    • Comments: We are developing an efficient method with short learning time as well as a new SOTA in accuracy.

Smart Mobility

  1. Estimation of travel time and delay time between bus stops using bus probe data, and estimation of waiting time at Kyushu University Gakkentoshi Station
    • Data used: Showa bus probe data, bus timetables, JR timetables at Kyushu University Science City Station
    • Targets: Travel time between bus stops, delay time estimation, waiting time at Kyushu University Gakkentoshi Station
    • Description: Probe data of Showa buses entering Ito Campus are being collected to estimate the number of stops, stop time, travel time and delay time between bus stops, and waiting time at the Kyushu University Gakkentoshi Station. It is applied to estimating road conditions and road characteristics, investigating the causes of travel time and delay time, and identifying bus services that reduce passenger satisfaction and their conditions and reasons, taking into account the connectivity between buses and trains. Estimation of driver characteristics is also performed.
  2. Estimation of driver characteristics, dangerous driving, etc. based on drive recorder data
    • Data used: Probe data (position, date/time, day of the week, acceleration, speed, vehicle ID), weather information, video information
    • Targets: Driver characteristics, dangerous driving conditions, dangerous locations, etc.
  3. Analysis of Mobility Situation in Local Communities
    • Data used: On-demand bus usage history data
    • Target: Boarding and alighting patterns of users
    • Description: Estimation of local mobility patterns based on on-demand bus data, as well as proposals that may be useful for improving mobility conditions through data analysis

Recommendation and Data (Text) Mining

  1. Building a question-and-answer system (chatbot)
    • Identification of unanswerable questions (questions with different intentions or outside the domain)
      • Data used: Open data (text data)
      • Target: Questions with different intentions or out of domain from the prepared answers, answers corresponding to the questions
      • Description: We are developing a chatbot system through joint research with a company. We are developing a method to generate pseudo-out-of-domain questions using GAN (Generative Adversarial Network) and a method to identify out-of-domain questions using only information within the domain of possible answers.
    • Estimation of speaker emotion
      • Data used: Open data (text data, multimodal data)
      • Target: Speaker's emotion
      • Description: Estimation of speaker's emotion such as sadness, joy, anger, normalcy, loneliness, happiness, etc. from speaker's textual expressions. This method aims to generate sentences (answers) for smoother dialogues with speakers.
  2. Named entity recognition (NER) and relation recognition from patent documents
    • Data used: Patent documents (especially related to material science)
    • Target: Estimation of NER Tags (element, quantity, etc.) to be extracted
    • Description: NER using a small amount of supervised data. We are currently submitting a paper, and will provide more details after publication.
  3. Product data (image and text multimodal data) mining
    • Helpful Vote counts of Review
      • Data used: Open data
      • Target: Number of people who answered that the Review was helpful (Helpful Vote counts)
      • Description: Research on estimation method of Helpful Vote counts adapted to the distribution of Helpful Vote counts of Review, focusing on the fact that the distribution of Helpful Vote counts of Review changes.
    • Estimation of clicked products using multi-armed bandits
      • Data used: Licensed data
      • Target: Estimation of products to be clicked
      • Description: Focusing on the fact that the effectiveness of the recommendation method used to estimate clicked products is affected by the user's usage history, context (product category, etc.), and changes over time, research and development of a multi-armed bandid policy that can quickly respond to changes in context and time.
    • Estimation of the degree of fraudulent reviews (cherry-picking)
      • Data used: Scraped data from e-commerce sites
      • Target: Fraudulent review rate (cherry-picking rate)
      • Description: We collect data from e-commerce sites by scraping, and use the cherry-picking data from a cherry-checker as the teacher data to find out the characteristics that are necessary to obtain the degree of cherry-picking and that have not been disclosed to the public. In addition, we will research and develop a personalized recommendation system that takes into account the cherry-picking degree and other factors.
  4. Mining of large-scale text data on the Web.
    • Data used: Open data, data collected through API contracts.
    • Target: Estimate products, places, and types of products and places that users seek.
    • Description: Research on methods for acquiring useful features for recommendation from data on products, places, etc. that people interact with (selection) using Graph Convolutional Network (GCN), and utilizing them for recommendation and estimation.

Keywords: Deep Learning, Machine Learning, Natural Language and Image Processing, Text and Data Mining, Recommendation

Our research activities : can be seen at the following page: Recent Papers and Books