Sep 4, 2014

IMS student members win Data Mining Cup

Iowa State student team wins international data mining competition

A team of Iowa State University graduate students topped 98 other universities from 28 countries to capture first place in the 15th annual Data Mining Cup. Prudsys AG, a leading European data mining company, sponsors the intelligent-data analysis competition for universities. According to Prudsys, the competition is meant to be a “bridge between university and industry to identify the best up-and-coming data miners.”

Teams had six weeks to develop a solution for a data mining problem about optimal return prognosis. This year, teams had to use an unidentified online store’s historical purchase data to create a model for new orders that predicts the probability of a purchase being returned.

“The motivation for this contest data is that some online retailers offering free return shipping have almost half of their orders returned,” said Iowa State’s team leader and statistics Ph.D. candidate Cory Lanker. “We could advance our ideas to create an application that helps online retailers reduce returned shipments and increase profit margins,” he said.

Between April 2 and May 14, teams worked at their respective universities to develop their probability predictions. “Teams submitted return probabilities for approximately 50,000 purchases made in one month using data from approximately 481,000 orders from the previous 12 months,” Lanker said. “They used 12 variables that characterize the customer information—such as age, location and purchase history—and information about ordered items—such as size, color, price, etc.” Lanker said that the basis of Iowa State’s technical solution was “to fully characterize customer behavior, which we did using advanced statistical learning concepts on the provided history of purchases. Once we successfully characterized customer behavior, we could then best predict whether a new purchase would be returned.”

“This was specifically a student contest,” said Steve Vardeman, University Professor of statistics and industrial engineering. “The team had no direct faculty input on the problem. They organized and executed their solution entirely on their own.”

A jury scored all 57 submitted solutions (not all teams submitted a solution), and invited the top ten teams to Berlin to present their solution methods at the Prudsys User Days conference. Each team gave a ten-minute presentation.

Iowa State team members and their departments are Guillermo Basulto-Elias (statistics), Fan Cao (statistics), Xiaoyue Cheng (statistics), Marius Dragomiroiu (computer science), Jessica Hicks (bioinformatics and computational biology), Cory Lanker (statistics), Ian Mouzon (statistics), Lanfeng Pan (statistics) and Xin Yin (bioinformatics and computational biology/statistics).

Basulto-Elias, Yin and Lanker went to Berlin for the presentation and announcement. Final team rankings were announced beginning with tenth place.

“Before long, fifth place was announced and it wasn’t us, so I knew we did better this year,” Lanker said. “When it was down to two teams, [Prudsys organizer] Jens Scholz said, ‘The United States lost in the World Cup last night,’ and I thought, ‘Well, this is us, we finished second,’ but then he added, ‘But a United States team has won the 2014 Data Mining Cup!’”

Lanker says the shock has not worn off yet. He attributes the team’s success to multiple weekly team meetings that were well attended at the end of the semester, demonstrating the “dedication we all had to our team’s success.”

“As a leader, I stressed sticking to a schedule so we didn’t run out of time, and involving everyone in discussions about making the many important statistical decisions,” Lanker said. “The level of teamwork was extraordinary … with many large contributions from all members.”

http://www.data-mining-cup.de/fileadmin/templates/img/DMC/DMC_2014/Preistraeger/image1.jpg

[l-r]: IMS members Guillermo Basulto-Elias and Xin Yin, together with Iowa State’s team leader Cory Lanker, proudly hold their first prize in the international Data Mining Cup, beating 124 other teams that came from 99 universities in 28 countries. See http://www.data-mining-cup.de/en/dmc-competition/winner/

Share

Leave a comment

*

Share

Welcome!

Welcome to the IMS Bulletin website! We are developing the way we communicate news and information more effectively with members. The print Bulletin is still with us (free with IMS membership), and still available as a PDF to download, but in addition, we are placing some of the news, columns and articles on this blog site, which will allow you the opportunity to interact more. We are always keen to hear from IMS members, and encourage you to write articles and reports that other IMS members would find interesting. Contact the IMS Bulletin at bulletin@imstat.org

What is “Open Forum”?

In the Open Forum, any IMS member can propose a topic for discussion. Email your subject and an opening paragraph (to bulletin@imstat.org) and we'll post it to start off the discussion. Other readers can join in the debate by commenting on the post. Search other Open Forum posts by using the Open Forum category link below. Start a discussion today!

About IMS

The Institute of Mathematical Statistics is an international scholarly society devoted to the development and dissemination of the theory and applications of statistics and probability. We have about 4,500 members around the world. Visit IMS at http://imstat.org
Latest Issue