NRI's Data Scientists Attempt to Predict NFT Prices (Part 1)

NFT Price Prediction Challenge（road to "Appraiser" of digital art)

Hello everyone, we are Naohiro Manabe, Ryuichi Kikkawa, Kazuteru Hirahara, and Ryo Nakai, data scientists of the NFT analysis team from Nomura Research Institute, Japan.

"NFT" has been attracting a lot of attention recently, and many of you may be wondering about it. As explained in detail in a later section, NFT is a digital product that guarantees its “uniqueness” by utilizing blockchain technology.

In the past, real works of art, opposite to digital works, such as Van Gogh's "Sunflowers," sometimes become very expensive for it is the one and only, but because of today’s digital technology, artworks can be easily reproduced and had to be supplied nearly for free (as in the music industry), making it difficult to pay appropriate remuneration to the artists. However, NFT has the potential to break out of this situation.

Famously, "Everydays - The First 5000 Days," a digital work of art by Beeple, sold for approximately 7 billion yen. And also, "Alternate dimension - 幻想絢爛(Gensokenran)" by Japanese VR artist Aimi Sekiguchi has been sold as soon as it hits the market for approximately 13 million yen. Through digital works of art offered by NFT, more and more people start to feel that owning them is just as valuable as owning real works of art.

We see great potential in NFT too, but we have little knowledge of the art. However, we wondered if we could try something interesting from the perspective of data science, which is our expertise.

So we decided to try our hand at predicting NFT prices with the power of data science, analyzing data on NFT historical transaction prices and using statistics and machine learning techniques to uncover the mechanisms behind the price calculations.

In the real work of art’s world, there is a job called “Appraiser”. They do not only verify the authenticity of an artwork, but also appraise how much it is worth from various perspectives, such as rarity of the design and historical significance. By writing this article, we aim to become, so to speak, "Appraisers" in the world of NFT.

In this first post, I will share you how to collect data from the NFT marketplace using API and how to perform a simple analysis. We hope anyone interested in NFT or data science will read and enjoy this post.

What is NFT anyway?

NFT stands for Non-Fungible Token.

Non-Fungible means that it is irreplaceable and has unique value. For example, virtual currencies such as Bitcoin and legal tender such as Japanese yen are fungible: there is no distinction between two 10,000 yen notes, and both notes have the same value as 10,000 yen. Now, what about baseballs? For example, a baseball autographed by Shohei Otani is different from an ordinary baseball; it is irreplaceable and has unique value. On the other hand, there are things that are unique but have no value, which means a ball autographed by an ordinary person would cost the same or less than an ordinary ball.

Token is a type of digital content issued using an existing blockchain. Among digital content issued using a blockchain technology, there is a type of digital content called a "coin," which is issued using its own blockchain technology and in principle has no issuer. Bitcoin, for example, has a pre-determined limitation in the number of coins to be issued, which is 21 million, and based on a proof-of-work mechanism (POW), bitcoins are only issued to those who successfully generate a new block. Tokens, on the other hand, are issued using an existing blockchain, such as Ethereum, and the number of tokens to be issued can be controlled. Furthermore, in addition to their monetary properties, tokens also allow issuers to set their own added value, such as giving token holders the right to vote on decisions in the community.

So, what are the benefits of converting works of art to NFT? NFT uses a mechanism called “smart contract,” such as Ethereum, to record information about the work of art, the original owner, and who bought on the blockchain. This ensures that digital art which became NFT (so called NFTed) is an original, not a copy. The blockchain also allows all transactions to be traced, so that, for example, 20% of the profit from each NFT transaction can be distributed to the creator. (In fact, major marketplaces such as OpenSea makes it easy to set up such a mechanism.) Previously, artists could only earn income in the primary marketplace, where they initially sell their art, but now they can also earn income in the secondary marketplace, every time their work is resold.

NFT Marketplace

OpenSea is the most famous NFT marketplace, accounting for more than 90% of the global market share, and allows users to buy and sell products using wallets such as MetaMask and other virtual currencies such as Ethereum. The selling method could be chosen from fixed-price or auction.

opensea.io

Other well-known marketplaces such as Rarible and SuperRare, each of which has its own characteristics, such as issuing its own governance tokens or limiting the number of creators who can sell their products.

In addition, several marketplaces have already opened in Japan, including Adam byGMO operated by GMO, NFT Studio operated by Coincheck, and LINE NFT operated by LINE. Several domestic marketplaces also allow NFT to be purchased using Japanese yen.

What is “Collectibles”?

In the NFT marketplace, a variety of NFTs are traded, not only works of art, but also items for the games and metaverse, or even videos and music. In this article, we would like to focus on the field called “Collectibles.”

Collectibles are a series of NFT art created based on a certain concept, like trading cards, for the purpose of owning and collecting. The famous work, such as Bored Ape Yacht Club (monkey drawing) and CryptoPunks (human image made of dots) are algorithmically generated and not created by a specific artist, but by a team of designers and engineers. Each collectibles often has about 10,000 pieces of work.

Bored Ape Yacht Club was launched in April 2021 and quickly accelerated the NFT movement; Bored Ape Yacht Club and CryptoPunks are at the top of the market value rankings for collectibles(as of this writing). CryptoPunks was originally developed by Larva Labs, but is now operated by Yuga Labs, the company that launched Bored Ape Yacht Club.

The year after 2021, the virtual inception of NFT art, a collection inspired by Japanese anime called Azuki was launched by five men in their 30s living in Los Angeles. 8,700 NFT works were sold out in three minutes, generating over $29 million. Today, Azuki still remains popular, ranked in the top 10 in terms of market value. Azuki holders can participate in a community called The Garden, which is expected to expand into metaverse in the future.

Let’s dive in!

From here, we will show how to obtain actual NFT transaction data from marketplaces and challenge to build a model to predict NFT price.

“FloorPrice” is important in NFT, which indicates the lowest price within a particular collectibles. You can check the floor price on the web using"NFT Price Floor."

nftpricefloor.com

We believe that in creating a model to predict NFT prices, in addition to the attribute data of individual NFTs, information on all past transactions such as Mint (creation), Sale (buying and selling), and Transfer (transfer/resale) is necessary.

There are two major options for automatically obtaining this information; "web scraping" and "acquisition through API.” Web scraping" can be used to acquire data virtually from any website, but it is explicitly prohibited in some services by their terms and conditions. Also, if done incorrectly, web scraping can overload servers. Although "acquisition through API" may require more time and effort to obtain API key, it can be used with peace of mind because it is an official way of acquiring data provided by service operators.

This time, we adopted the method of acquiring data through API and decided to use LooksRare as the data source, whose API can be used without registration. (OpenSea is a well-known marketplace, but we did not use OpenSea's API this time because it was difficult to obtain an API key.)

LooksRare is a relatively new NFT marketplace that launched its service in January 2022 and all NFTs on the Ethereum blockchain are indexed, including those traded on OpenSea. Therefore, transaction history data for NFTs staged on OpenSea can also be retrieved using LooksRare's API.

Consideration of necessary data

We will examine the data required to perform the analysis for price estimation.

1. NFT Attribute Data

Attributes related to the NFT itself are necessary. Although it is possible to input image data into the model, due to the amount of computation, it is preferable to have attributes such as "color" and "shape" in a format that allows them to be treated as text.

2. Transaction history data

Due to the nature of the blockchain, transaction history is publicly available and does not necessarily need to be obtained from the marketplace, but for implementation convenience, it is desirable that it be available from the same source.

3. Transaction progress data

Since marketplace transactions are often conducted in the form of auctions, the initial price given by the seller at the time of listing or the names of bidders and their bidding prices during the process are also important data. Since the blockchain only records the final transaction results, such data must be obtained from the marketplace.

Data Acquisition

In order to retrieve data through the LooksRare API, we need to pass parameters in a specific format to a specific URL (“endpoint”). In this case, we will be looking for an API endpoint to retrieve attributes and transaction history.

Referring to the API documentation, we see that the following endpoints exist;

Retrieve details (attributes) for individual data
- https://api.looksrare.org/api/v1/tokens?collection=[collection_id]&tokenId=[token_id]
Retrieve transaction history for individual data
- https://api.looksrare.org/api/v1/events?collection=[collection_id]&tokenId=[token_id]&type=[transaction_type]

where collection_id , token_id and transaction_type are parameters; collection_id is the ID of a collectible containing multiple NFTs, token_id is the ID of a single NFT contained within it, and transaction_type is the type of transaction.

For this analysis, we selected the "Beanz" collectible, which is produced by the same team as Azuki mentioned earlier. We chose this collectible because its price level is not too high compared to the original "Azuki" and it has more extensive attribute data.

The following is a code that obtains "Beanz" information from the API and stores it as a json file.

import urllib.request
import os
import json
import glob
import requests

collection_id = '0x306b1ea3ecdf94aB739F1910bbda052Ed4A9f949'

for i in range(1000): # loop token_ids
    print(i)
    try:
        # Fetch attributes of each token
        api_url = 'https://api.looksrare.org/api/v1/tokens?collection=%s&tokenId=%d' % (collection_id, i)
        filename = BASEDIR+r'\looksrare_beanz\\' + str(i) + '_token.json'
        if not os.path.exists(filename):
            r = requests.get(api_url, proxies=proxies)
            data = json.loads(r.text)
            if data["success"]:
                with open(filename, 'w') as f:
                    f.write(r.text)
            else:
                # If token does not exist, skip to next token
                next

        # Fetch transaction logs
        for t in ['SALE','TRANSFER','MINT']:
            api_url = 'https://api.looksrare.org/api/v1/events?collection=%s&tokenId=%d&type=%s' % (collection_id, i)
            filename = BASEDIR+r'\looksrare_beanz\\' + str(i) + '_'+t+'.json'
            if not os.path.exists(filename):
                r = requests.get(api_url, proxies=proxies)
                data = json.loads(r.text)
                if data["success"]:
                    with open(filename, 'w') as f:
                        f.write(r.text)
                else:
                    next  
    except:
        pass

Processing of acquired data

Next, a data mart for analysis is created based on the information obtained from the API.

Since json files for "NFT attribute data," "transaction history data," and "data on the progress of transactions" exist for each item, only the items necessary for data analysis are extracted from each json file and merged into a single data frame. The following is an example of code to create a data frame for "NFT attribute data.

First, among the attribute data listed in the json file for each item, only the attribute information necessary for analysis, such as unique item ID (token_Id) and characteristics (attributes) of each item, is extracted, and individual data frames are generated from that. Finally, they are combined to create a data frame in which information on all items is aggregated. In the same way, data frames are created for "transaction history data" and "data on the progress of transactions" by including unique IDs for each item, enabling the construction of a data mart for analysis by matching the attribute data of each item with data on transactions.

import pandas as pd
from tqdm import tqdm
import joblib

# List all json files that contain attirbutes of each NFT 
paths_token = glob.glob(BASEDIR + r'\looksrare_beanz\\*_token.json')
 
dfs = []
# Make a dataframe of each item's attributes 
for path_token in tqdm(paths_token):
    try:
        df = pd.read_json(path_token)
        
        id_columns = ["id"]
        id_values = []
        id_values.append(df.loc["id"].data)
 
        image_columns = ["imageURI"]
        image_values = []
        image_values.append(df.loc["imageURI"].data)
 
        token_columns = ["tokenId"]
        token_values = []
        token_values.append(df.loc["tokenId"].data)
 
        attribute_columns = list(pd.DataFrame(df.data.iloc[0]).T.loc["traitType", :])
        attribute_values = list(pd.DataFrame(df.data.iloc[0]).T.loc["value", :])
 
        columns = id_columns + token_columns + image_columns + attribute_columns
 
        values = id_values + token_values + image_values + attribute_values
       
        df_token = pd.DataFrame(columns=columns)
        df_token.loc[0] = values
 
        dfs.append(df_token)
 
    except:
        pass

# Concatenate all data and save to pickle file
token_df = pd.concat(dfs, axis=0).reset_index(drop=True)
joblib.dump(token_df,OUTPUTDIR+r"\token_df.pkl")

Transaction Price Analysis

Now that the data mart for the analysis has been created, let's check the trend of transaction prices of Azuki "Beanz" acquired through the API.

Of the 19,932 NFTs of "Beanz" acquired through the API this time, 1,653 were traded, and it seems that about 8% of the total were traded at least once in the past. So, at what price were the NFTs actually traded? Based on the distribution of transaction prices (converted to Japanese yen), the most traded prices are between 10,000-100,000 yen, and the next most traded prices are between 100,000-300,000 yen (the visualization excludes outliers, but the highest price is almost 2 million yen!).

This bimodality in transaction prices is assumed to be due to fluctuations in the Ethereum price. In fact, in early May 2022, the price of the entire virtual currency market fell due to the tightening monetary policy of the United States, etc. If we visualize the transaction prices separately before and after that time, we can clearly divide the distribution of transaction prices.

In order to remove as many effects as possible from market fluctuations, and to better examine the value of the NFT image itself, subsequent analyses used transaction data prior to early May 2022.

Relationship between the features of each NFT image and the transaction price

In Azuki "Beanz," each image feature is tagged in "Properties." As shown in the sample below, if the character wears big glasses, it is tagged as "Big Glasses" in the "Properties" item "FACE".

source: Bean #17601 - BEANZ Official | OpenSea

Since these "Properties" features are managed as common tags within "Beanz," we will proceed with the analysis using these common tag items in order to check the relationship between the transaction price and features of each image.

First, we checked the average number of items and transaction prices by "Type" item, which indicates the general type of image among the "Properties" items. The "Type" item classifies the characteristics of image characters such as "Red Bean" and "Blue Bean" by color. The analysis results show that items with types such as "Golden Bean" and "Sprit Bean," of which only a few dozen exist in the whole collection, tend to be traded at higher prices. This result alone suggests that items are traded according to the market principle that “the rarer the item is, the more expensive it is.”

The following are examples of "Golden Bean" and "Sprit Bean" items, which have the appearance of "rare" items.

source: Bean #11515 - BEANZ Official | OpenSea

source: Bean #19740 - BEANZ Official | OpenSea

Next, we checked the average number of items and trading prices by "Eyes" item, which indicates eye type, among the "Properties" items. The analysis shows that items with rare "Eyes" characteristics such as "Lightning" and "Sprit - Determind" tend to be traded at higher prices, while items with "Sprit - Dots", "Sprit - Closed" and "Fire", which are also rare, are traded at lower prices.

This result suggests that it is not always the case that rare features lead to higher prices. It seems that among the rare features, there are some that are particularly advantageous to mark high transaction prices. Or the combination of certain rare features may contribute significantly to transaction prices, and further detailed analysis may be required.

Thank you for taking the time to read this article to the end. In this article, we have explained the overview of NFT, data acquisition from marketplaces, and simple visualization. We hope you will try your hand and playing with data referring to the article.

Next time, we will work on building a price prediction model for NFT.

Reference (Japanese)