[머신러닝-Tensorflow]Lec-08 TensorFlow로 파일에서 데이터 파일 읽기

티스토리 뷰

[머신러닝-Tensorflow]Lec-08 TensorFlow로 파일에서 데이터 파일 읽기

감자형 2018. 3. 11. 17:29

1. Loading data from file

1)이번 강좌에서는 여러가지 파일에 있는 데이터들을 사용하여 활용하는 방법에 대해 알아 볼 것입니다.

data-01.csv 파일을 이용할 것 이므로, data-01.csv파일을 준비합니다.(data file)

* data-01.csv File

73	80	75	152
93	88	93	185
89	91	90	180
96	98	100	196
73	66	70	142
53	46	55	101
69	74	77	149
47	56	60	115
87	79	90	175
79	70	88	164
69	70	73	141
70	65	74	141
93	95	91	184
79	80	73	152
70	73	78	148
93	89	96	192
78	75	68	147
81	90	93	183
88	92	86	177
78	83	77	159
82	86	90	177
86	82	89	175
78	83	85	175
76	83	71	149
96	93	95	192

2) data-file 준비가 완료 된후

이제 data-file을 어떻게 읽어 올것인지를 확인해보자.

우리는 python lib 인 numpy를 사용할것이다.

ex1)

xy = np.loadtxt('data-01.csv',delimeter=',',dtype = np.float32); 이런식으로 사용할 수 있음.

그리고, 파이썬의 강력한 기능인 slicing을 알아 볼 것이다.

그전에 slicing을 써보았기때문에, 코드 분석할때 간단하게 주석을 달고 넘어 가겠습니다.

slicing을 잘 모르신다면 한번 찾아보시는것을 추천합니다.

ex2)

x_data = xy[:,0:-1]

설명: " : " 앞 부분은 비어있으므로 전체를 나타낸다. N열 부분을 모두 가져오겠다는 것이다.

0 인덱스 부터 -1은 끝값을 나타내므로, 0부터 -1(끝값) 전까지의 값을 가져온다.

y_data = xy[:[-1]]

설명: ":" 앞부분은 위와동일 하게 가져온다. N열 부분의 값을 모두 가져오고나서 -1값만 가져온다.

2. 실습 코드

# Lab 4 Multi-variable linear regression

import tensorflow as tf

import numpy as np

tf.set_random_seed(777) # for reproducibility

#파일을 읽어 들인다. data-01.csv

xy = np.loadtxt('data-01-test-score.csv', delimiter=',', dtype=np.float32)

x_data = xy[:, 0:-1]

y_data = xy[:, [-1]]

#slicing

# Make sure the shape and data are OK

print(x_data.shape, x_data, len(x_data))

print(y_data.shape, y_data)

# placeholders for a tensor that will be always fed.

X = tf.placeholder(tf.float32, shape=[None, 3]) # X값 3

Y = tf.placeholder(tf.float32, shape=[None, 1]) # Y값 1

# 들어오는값 3(X) , 나가는값 1(Y) 로 생각

W = tf.Variable(tf.random_normal([3, 1]), name='weight')

b = tf.Variable(tf.random_normal([1]), name='bias')

# Hypothesis

hypothesis = tf.matmul(X, W) + b

# Simplified cost/loss function

cost = tf.reduce_mean(tf.square(hypothesis - Y))

# Minimize

optimizer = tf.train.GradientDescentOptimizer(learning_rate=1e-5)

train = optimizer.minimize(cost)

# Launch the graph in a session.

sess = tf.Session()

# Initializes global variables in the graph.

sess.run(tf.global_variables_initializer())

#학습 하는 부분 !

for step in range(2001):

cost_val, hy_val, _ = sess.run(

[cost, hypothesis, train], feed_dict={X: x_data, Y: y_data})

if step % 10 == 0:

print(step, "Cost: ", cost_val, "\nPrediction:\n", hy_val)

# Ask my score

print("Your score will be ", sess.run(

hypothesis, feed_dict={X: [[100, 70, 101]]}))

print("Other scores will be ", sess.run(hypothesis,

feed_dict={X: [[60, 70, 110], [90, 100, 80]]}))

>>> # Ask my score

... print("Your score will be ", sess.run(

... hypothesis, feed_dict={X: [[100, 70, 101]]}))

('Your score will be ', array([[181.73277]], dtype=float32))

>>>

>>> print("Other scores will be ", sess.run(hypothesis,

... feed_dict={X: [[60, 70, 110], [90, 100, 80]]}))

('Other scores will be ', array([[145.86266],

[187.2313 ]], dtype=float32))

정확하게 예측할 수 있음. feed_dict값을 넣어서 점수 예측가능

3. 실습을 마치고, 만약에 데이터 크기가 엄청 많은것들을 처리할때, tensorflow에서는 강력한 기능인 Queue Runners 지원한다

1) Queue Runner Structure

이제 이 값들을 tensorflow 데이터를 읽어오게 할 수 있게하는 batch가 있음.

2) 실습 해보기

# Lab 4 Multi-variable linear regression

# https://www.tensorflow.org/programmers_guide/reading_data

import tensorflow as tf

tf.set_random_seed(777) # for reproducibility

filename_queue = tf.train.string_input_producer(

['data-01-test-score.csv'], shuffle=False, name='filename_queue')

reader = tf.TextLineReader()

key, value = reader.read(filename_queue)

# Default values, in case of empty columns. Also specifies the type of the

# decoded result.

record_defaults = [[0.], [0.], [0.], [0.]]

xy = tf.decode_csv(value, record_defaults=record_defaults)

#빨아 들인다는생각으로 batch가 그 파일들의 내용을 쭉쭉 가져온다.

# collect batches of csv in

train_x_batch, train_y_batch = \

tf.train.batch([xy[0:-1], xy[-1:]], batch_size=10)

# placeholders for a tensor that will be always fed.

# 3,1(x,y)값을 반드시 맞춰줄 필요가 있으므로, 확인을 할것

X = tf.placeholder(tf.float32, shape=[None, 3])

Y = tf.placeholder(tf.float32, shape=[None, 1])

W = tf.Variable(tf.random_normal([3, 1]), name='weight')

b = tf.Variable(tf.random_normal([1]), name='bias')

# Hypothesis

hypothesis = tf.matmul(X, W) + b

# Simplified cost/loss function

cost = tf.reduce_mean(tf.square(hypothesis - Y))

# Minimize

optimizer = tf.train.GradientDescentOptimizer(learning_rate=1e-5)

train = optimizer.minimize(cost)

# Launch the graph in a session.

sess = tf.Session()

# Initializes global variables in the graph.

sess.run(tf.global_variables_initializer())

# Start populating the filename queue.

coord = tf.train.Coordinator()

threads = tf.train.start_queue_runners(sess=sess, coord=coord)

# 학습하는 과정

for step in range(2001):

x_batch, y_batch = sess.run([train_x_batch, train_y_batch])

cost_val, hy_val, _ = sess.run(

[cost, hypothesis, train], feed_dict={X: x_batch, Y: y_batch})

if step % 10 == 0:

print(step, "Cost: ", cost_val, "\nPrediction:\n", hy_val)

coord.request_stop()

coord.join(threads)

# Ask my score

print("Your score will be ",

sess.run(hypothesis, feed_dict={X: [[100, 70, 101]]}))

print("Other scores will be ",

sess.run(hypothesis, feed_dict={X: [[60, 70, 110], [90, 100, 80]]}))

'''

Your score will be [[ 177.78144836]]

Other scores will be [[ 141.10997009]

[ 191.17378235]]

'''

저작자표시 (새창열림)

'AI' 카테고리의 다른 글

[머신러닝 - Tensorflow] Lec10-Logistic Regression의 cost 함수 설명 (0)	2018.03.16
[머신러닝-Tensorflow]Lec-09 Logic Classfication 가설 함수 정의 (0)	2018.03.16
[머신러닝-Tensorflow]Lec-07 multi-variable linear regression을 TensorFlow에서 구현하기 (0)	2018.03.11
[머신러닝-Tensorflow] Lec-06 multi-variable linear regression (0)	2018.03.11
[머신러닝-Tensorflow] Lec-05 Cost Minimize 실습 (0)	2018.03.11

공지사항

Potato Coding IT Blog

최근에 올라온 글

최근에 달린 댓글

Total

Today

Yesterday

링크

TAG more

« 2025/05 »
일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

글 보관함

관광이 블로그

티스토리 뷰