Google tensorflow certification 04
Google Tensorflow Certification 04
Category 2 - 심층신경망 모델(정형데이터)
Tensorflow Datasets 활용
data load 방법
tfds.load문서 발췌
The easiest way of loading a dataset is tfds.load. It will:
- Download the data and save it as tfrecordfiles.
- Load the tfrecordand create thetf.data.Dataset.
ds = tfds.load('mnist', split='train', shuffle_files=True)
                 .
                 .
                 .
Some common arguments:
- split=: Which split to read (e.g.- 'train',- ['train', 'test'],- 'train[80%:]',…). See our split API guide.
- shuffle_files=: Control whether to shuffle the files between each epoch (TFDS store big datasets in multiple smaller files).
- data_dir=: Location where the dataset is saved ( defaults to- ~/tensorflow_datasets/)
- with_info=True: Returns the- tfds.core.DatasetInfocontaining dataset metadata
- download=False: Disable download
####
iris data 분석
iris문서 발췌
- Description:
This is perhaps the best known database to be found in the pattern recognition literature. Fisher’s paper is a classic in the field and is referenced frequently to this day. (See Duda & Hart, for example.) The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other.
- Homepage: https://archive.ics.uci.edu/ml/datasets/iris
- Source code: tfds.structured.Iris
- Versions:
    - 2.0.0(default): New split API (https://tensorflow.org/datasets/splits)
 
- Download size: 4.44 KiB
- Dataset size: Unknown size
- Auto-cached (documentation): Unknown
- Splits:
| Split | Examples | 
|---|---|
| 'train' | 150 | 
***data set 은 train만 있음을 알 수 있음(나중에 train/valid data로 갈라서 사용)***
- Features:
FeaturesDict({
    'features': Tensor(shape=(4,), dtype=tf.float32),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=3),
})
***features는 4개, label은 3개임을 알 수 있음***
- Supervised keys (See as_superviseddoc):('features', 'label')
- Figure (tfds.show_examples): Not supported.
- 
    Examples (tfds.as_dataframe): 
- Citation:
@misc{Dua:2019 ,
author = "Dua, Dheeru and Graff, Casey",
year = "2017",
title = "{UCI} Machine Learning Repository",
url = "http://archive.ics.uci.edu/ml",
institution = "University of California, Irvine, School of Information and Computer Sciences"
}
실습(Iris)
Step 1. Import
import numpy as np 
import tensorflow as tf 
from tensorflow.keras.layers import Dense, Flatten 
from tensorflow.keras.models import Sequential 
from tensorflow.keras.callbacks import ModelCheckpoint 
import tensorflow_datasets as tfds
Step 2. Preprocessing
# 전처리할 데이터 로드(Tensorflow datasets - iris)
# load('데이터셋 이름', split='train데이터의 시작부터 80%까지') > train_dataset으로 이용
train_dataset = tfds.load('iris', split='train[:80%]') 
# load('데이터셋 이름', split='train데이터의 80%부터 끝까지') > valid_dataset으로 이용
valid_dataset = tfds.load('iris', split='train[80%:]')
전처리 요구 조건
- 
    label 값을 one-hot encoding 할 것 
- 
    feature (x), label (y)를 분할할 것 
# 전처리 함수 생성
def preprocess(data):
    x = data['features'] # data를 받아 feature를 x에 할당
    y = data['label']	# data를 받아 label을 y에 할당
    y = tf.one_hot(y, 3) # 원핫인코딩, label(y)을 3개로 
    					 # ex) [1, 0, 0], [0, 1, 0], [0, 0, 1]
    return x, y
# 전처리 함수 적용
# train_dataset(에).map(이 함수를 적용).batch(베치사이즈)
batch_size = 10
train_data = train_dataset.map(preprocess).batch(batch_size)
valid_data = valid_dataset.map(preprocess).batch(batch_size)
Step 3. Modeling
model = Sequential([
    # input_shape : feature == 4 이므로, (4, ) or [4]
    Dense(512, activation='relu', input_shape=(4,)),
    Dense(256, activation='relu'),
    Dense(128, activation='relu'),
    Dense(64, activation='relu'),
    Dense(32, activation='relu'),
    Dense(3, activation='softmax'), 
    # 분류(Classification) label이 3개 -> 이진분류X, activation = 'softmax'
])
#####
Step 4. Compile
# optimizer = 'adam' (분류 최적화는 adam이 가장 좋다(?))
# loss = '(원핫인코딩O)categorical(모델 마지막 활성함수 Softmax)_crossentropy'
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['acc'])
Step 4.5. ModelCheckpoint 생성
checkpoint_path = "my_checkpoint.ckpt" # 체크포인트 위치는 로컬, 이름.ckpt or 이름.m5
checkpoint = ModelCheckpoint(filepath=checkpoint_path, 
                             save_weights_only=True, # 가중치만 저장
                             save_best_only=True, # 가장 좋은 결과만 저장
                             monitor='val_loss',  # 기준 = 'validation_loss가 가장 낮은 것'
                             verbose=1) # 출력
Step 5. Fit
# 학습(train data, Validation_data, epochs, callbacks[ckpt])
history = model.fit(train_data,
                    validation_data=(valid_data),
                    epochs=20,
                    callbacks=[checkpoint],
                   )
Step 5.5. Ckpt Load Weight
# 이 코드가 없다면, Ckpt 만드는 이유가 없음(가중치 저장만 해두고 사용 안하는 것)
model.load_weights(checkpoint_path)
댓글남기기